Low-resource automatic speech recognition and error analyses of oral cancer speech
نویسندگان
چکیده
In this paper, we introduce a new corpus of oral cancer speech and present our study on the automatic recognition analysis speech. A two-hour English dataset is collected from YouTube. Formulated as low-resource ASR task, investigate three acoustic modelling approaches that previously have worked well with scenarios using two different architectures; hybrid architecture transformer-based end-to-end (E2E) model: (1) retraining approach; (2) speaker adaptation (3) disentangled representation learning approach (only architecture). The achieve 4.7% (hybrid) 7.5% (E2E); 7.7%; 2.0% absolute word error rate reduction, respectively, compared to baseline system which not trained detailed results shows plosives certain vowels are most difficult sounds recognise in — problem successfully alleviated by proposed approaches; however these also relatively poorly recognised case healthy exception of/p/. performance phonemes strongly data-dependent; (4) terms manner articulation, E2E performs better however, large contribution overall performance. As for place vowels, labiodentals, dentals glottals captured models, bilabial, alveolar, postalveolar, palatal velar information. (5) Finally, provides some guidelines selecting words can be used voice commands systems speakers.
منابع مشابه
the effects of speech rate,prosodic features, and blurred speech on iranian efl learners listening comprehension
کلید واژه ها به زبان انگلیسی: effect of speech rate on listening comprehension, blurred speech,segmental and suprasegmental features,authentic speech,intelligibility, discrimination, omission, assimilation چکیده: سرعت مطالب شنیداری در کلام پیوسته بطور کلی همواره کابوسی بوده برای یادگیرنده های زبان دوم و بالاخص برای شنوندگان ایرانی. علی رغم عقل سلیم که کلام با سرعت کندتری فعالیتهای درک مطلب شن...
15 صفحه اولError Detection in Automatic Speech Recognition
We offer a supervised machine learning approach for recognizing erroneous words in the output of a speech recognizer. We have investigated several sets of features combined with two word configurations, and compared the performance of two classifiers: Decision Trees and Naïve Bayes. Evaluation was performed on a corpus of 400 spoken referring expressions, with Decision Trees yielding a high rec...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملSpeech production knowledge in automatic speech recognition.
Although much is known about how speech is produced, and research into speech production has resulted in measured articulatory data, feature systems of different kinds, and numerous models, speech production knowledge is almost totally ignored in current mainstream approaches to automatic speech recognition. Representations of speech production allow simple explanations for many phenomena obser...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Speech Communication
سال: 2022
ISSN: ['1872-7182', '0167-6393']
DOI: https://doi.org/10.1016/j.specom.2022.04.006